Modeling Customer Lifetime With Dynamic Customer Feedback Information

New Perspectives in Business and Econometrics

Alexander Kulumbeg

Marketing Institutes MCA & RDS

Daniel Winkler

Introduction

Story

  • Contractual setting - curated shopping
  • Nation-wide apparel subscription box service provider
  • Female customers only


  • Monthly surprise boxes with clothes selected by a stylist (person)
  • Option for customer to approve or change something in the box
  • Once received - rating of each item by categories and with optional written feedback

Story II

Idea

  • Propensity to churn changes over time
  • Traditionally data is
    • hard to obtain
    • static / collected once
  • Written feedback contains (un)conscious pieces of information
  • Feedback changes over time
    • Stylist did a better/worse job than before
    • Clothes’ color/fit/cut/size/material is good/bad
    • Items did/didn’t adhere to the customer preferences stated in the quiz

Problem

  • What is hiding in the dynamic feedback (e.g., emotionality, eloquence, engagement…)?
  • How do these components influence the risk of customer attrition?
  • Can we identify other (latent) time-varying signals that affect customer lifetime?

Data

  • Information on
    • Orders
    • Feedback
    • App usage
    • Customer journey
    • Style preferences
    • Stylist performance
    • Previews of Boxes
  • ca. 57,000 unique customers
  • ca. 260,000 transactions
  • ca. 1,050,000 feedback items
  • Distilled into a box-level dataframe
    • User demographics
    • User contract length
    • User lifetime spending
    • Box-level feedback variables
      • Word count
      • Sentiment
      • Eloquence

Model

Causal Model

Model Details

A Bayesian Model for Time-Varying Parameters


A piecewise exponential model for lifetimes.

  • Given set \(\mathcal{S}=\left\{s_{0}=0, s_{1}, \ldots, s_{J}\right\}, s_{0}<s_{1}<\cdots<s_{J}\) partitions the time axis into \(J\) intervals \(\left(s_{0}, s_{1}\right], \ldots,\left(s_{J-1}, s_{J}\right]\)


  • Hazard within interval is constant

\[ \lambda(t|\boldsymbol z_i; t \in (s_{j-1}, s_j]) = \lambda_{ij} = \exp\left(\beta_{0j} + \sum_{k=1}^{K} z_{i k} \beta_{kj}\right) \]

Piecewise Exponential Model

Evolution of the \(\beta_{kj}\)’s

As in Hemming and Shaw (2002), Gaussian random walks with initial state \(\beta_{k 0} \sim \mathcal{N}\left({\beta_{k}}, {\theta_{k}}\right)\) are considered: \[ \beta_{k j}=\beta_{k, j-1}+w_{j}, \quad w_{j} \sim \mathcal{N}\left(0, {\theta_{k}}\right). \]

’’

Priors on Innovation Variances and Initial Value Means

Triple gamma priors (Cadonna, Frühwirth-Schnatter, and Knaus 2020)1 are placed on both \(\beta_k\) and \(\theta_k\). Name stems from the fact that, when used for variances, it has a representation as a compound distribution consisting of three gamma distributions:

\[ \begin{aligned} \theta_{k}\mid{\xi}_{k}^{2} \sim \mathcal{G}\left(\frac{1}{2}, \frac{1}{2 \xi_{k}^{2}}\right), \quad& \xi_{k}^{2}\mid a^{\xi}, \kappa_{k}^{2} \sim \mathcal{G}\left(a^{\xi}, \frac{a^{\xi} \kappa_{k}^{2}}{2}\right), \\ \kappa_{k}^{2} \mid c^{\xi}, \kappa_{B}^{2} &\sim \mathcal{G}\left(c^{\xi}, \frac{c^{\xi}}{\kappa_{B}^{2}}\right). \end{aligned} \]

The first stage conditional prior implies the following first stage conditional prior on \(\sqrt \theta_k\): \[ \sqrt \theta_k | \xi_k^2\sim \mathcal{N}\left(0, \xi_k^2\right) \]

Adding a Factor (?)

To account for unobserved heterogeneity in the data, a grouped factor component can be added to the hazard rates. Let observation \(i\) belong to group \(g\), with \(g \in\{1, \ldots, G\} .\) Then the hazard rates look as follows: \[ \lambda_{i j}=\exp \left(\phi_{g} f_{j}+\beta_{0 j}+\sum_{k=1}^{K} z_{i k} \beta_{k j}\right), \] where \(f_{j}\) is allowed to vary over time according to a zero-mean stochastic volatility law of motion1: \[ \begin{aligned} f_{j} & \sim \mathcal{N}\left(0, e^{h_{j}}\right), \\ h_{j} \mid h_{j-1}, \phi_{f}, \sigma_{f}^{2} & \sim \mathcal{N}\left(\phi_{f} h_{j-1}, \sigma_{f}^{2}\right),\\ h_{0} & \sim \mathcal{N}\left(0, \sigma_{f}^{2} /\left(1-\phi_{f}^{2}\right)\right) . \end{aligned} \]

Results I

’’

Results II

’’

Results III

’’

Results IV

’’

Conclusion

Discussion

References

Cadonna, Annalisa, Sylvia Frühwirth-Schnatter, and Peter Knaus. 2020. Triple the Gamma—A Unifying Shrinkage Prior for Variance and Variable Selection in Sparse State Space and TVP Models.” Econometrics 8 (2): 20.
Gamerman, Dani. 1991. Dynamic Bayesian models for survival data.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 40 (1): 63–79.
Griffin, Jim, Phil Brown, et al. 2017. Hierarchical shrinkage priors for regression models.” Bayesian Analysis 12 (1): 135–59.
Hemming, Karla, and Ewart Shaw. 2002. A parametric dynamic survival model applied to breast cancer survival times.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 51 (4): 421–35.
Hosszejni, Darjus, and Gregor Kastner. 2021. Modeling Univariate and Multivariate Stochastic Volatility in R with stochvol and factorstochvol.” Journal of Statistical Software 100: 1–34.
Wagner, Helga. 2011. Bayesian estimation and stochastic model specification search for dynamic survival models.” Statistics and Computing 21 (2): 231–46.